Building a framework for fake news detection in the health domain

Disinformation in the medical field is a growing problem that carries a significant risk. Therefore, it is crucial to detect and combat it effectively. In this article, we provide three elements to aid in this fight: 1) a new framework that collects health-related articles from verification entities and facilitates their check-worthiness and fact-checking annotation at the sentence level; 2) a corpus generated using this framework, composed of 10335 sentences annotated in these two concepts and grouped into 327 articles, which we call KEANE (faKe nEws At seNtence lEvel); and 3) a new model for verifying fake news that combines specific identifiers of the medical domain with triplets subject-predicate-object, using Transformers and feedforward neural networks at the sentence level. This model predicts the fact-checking of sentences and evaluates the veracity of the entire article. After training this model on our corpus, we achieved remarkable results in the binary classification of sentences (check-worthiness F1: 0.749, fact-checking F1: 0.698) and in the final classification of complete articles (F1: 0.703). We also tested its performance against another public dataset and found that it performed better than most systems evaluated on that dataset. Moreover, the corpus we provide differs from other existing corpora in its duality of sentence-article annotation, which can provide an additional level of justification of the prediction of truth or untruth made by the model.

Thank you so much for your comments.We have followed the suggestions and indications of the editor and reviewers and have modified the article: -We have expanded the "Related work" section by including references to various works on embeddings and Transformers applied to disinformation detection.The modifications made are detailed in the response to the second reviewer.
-We have also reviewed the list of references verifying that none have been retracted.

Responses to the comments of reviewer 1
The authors have responded with sufficient clarity to all the critical issues in the manuscript and I see no impediment to the publication of the article as it is now presented.
-Thank you so much for your comments and feedback.We really appreciate it.

Responses to the comments of reviewer 2
Although the authors answered most of my comments, I still think that the manuscript is missing a large chunk of literature regarding word embeddings [1], transformers [2], and document embeddings [4] for detecting fake information.I recommend that the authors mention some of these research endeavors in their related work section so the study is complete, otherwise, it would seem that a large chunk of work was ignored.
[1] https://scholar.google.com/scholar?q=word+embeddings+misinformation+detection[2] https://scholar.google.com/scholar?q=transformers+misinformation[3] https://scholar.google.com/scholar?q=fake+news+document+embeddings-Thank you very much for your comments and suggestions.Certainly including these references completes the vision of the different works that have been carried out in this area of research.We have expanded the "Related Work" section within the subsection dedicated to content-based methods, including references to embeddings applied to disinformation detection.We have also added references to works that have used Transformers for this task, either as classifiers or to generate contextual information.The changes made to this subsection and Table 1 are shown in red below: Document content based: The first strategy considers only the textual content of the article by trying to associate misleading content with certain writing styles [1], using features such as bag-of-words vectors, part-of-speech tags, or Probabilistic Context Free Grammars to carry out this task.Text analysis techniques such as LIWC [2], and discourse level features that analyze the differences in terms of coherence and structure between deceptive and truthful narratives [3] have also been used as features to detect misleading content.We can even include in this section the Transformer models [4], that allow us to obtain state-of-the-art results with hardly any feature design.Methods such as word embeddings [6] have also been studied, which, when extended to the entire document, provide their ability to analyze the context in which a word appears, in order to identify typical patterns of disinformation and sensationalist language.It has even been found that the correct generation of these embeddings by training with the appropriate data [5], allows obtaining with simple classification models, results similar to or superior to those obtained with more complex models [7].Another strategy used to detect disinformation based on the content of the document is the currently ubiquitous Transformer models [8].On the one hand, taking advantage of its capabilities to extract latent features from the text without the need for elaborate feature design, and apply using transfer leaning the information obtained during its pre-training together with a fine adjustment in the target datasets to this detection task [4] [9].On the other hand, Transformers can also be used to generate contextual embeddings that capture the meaning of words, as well as subtle cues that can characterize misleading information.In these cases, transformers are typically used as components of an ensemble model [10] that contains other components dedicated to the processing of intermediate information and the final classification of documents.These methods can provide good results although the former those mentioned at the beginning require a careful design of features, and the results of the Transformer models can be very difficult to explain.However, the main disadvantage of all these methods is that they could be circumvented by an agent who can mimic the style of legitimate news even by incorporating misleading content.It is also worth mentioning in this group the existence of multimodal approaches that include other types of resources such as the images contained in the article [11] [12].

Table 1 :
Fake news detection approaches.